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Abstract 

Holland and Leinhardt (1981) proposed a directed random graph model, the pi model, to describe 
dyadic interactions in a social network. In previous work (Petrovic et al., 2010), we studied the algebraic 
properties of the pi model and showed that it is a toric model specified by a multi-homogeneous ideal. We 
conducted an extensive study of the Markov bases for pi that incorporate explicitly the constraint arising 
from multi-homogeneity. Here we consider the properties of the corresponding toric variety and relate them 
to the conditions for the existence of the maximum likelihood and extended maximum likelihood estimators 
or the model parameters. Our results are directly relevant to the estimation and conditional goodness-of-fit 
testing problems inpi models. 

Keywords: Algebraic statistics, Dyadic independence, Exponential random graph model, Holland-Leinhardt 
pi model, Markov basis. 



1 Introduction 

Holland and Leinhardt (1981) derived a statistical model, called pi, for directed random graphs to describe 
dyadic interactions in a social network. The p\ model, one of the earlier and most relevant statistical model 
for the analysis of network data, remains popular in applications, but some of its statistical properties are not 
well understood, including a relevant asymptotics. In the pi model, we represent the network by a directed 
graph where for every pair of nodes (i, j) we can observe four possible configurations: an edge from i to j, an 
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edge from j to i, a bi-directed edge (or both directed edges), or no edge at all. Each pair of nodes, or dyad, is 
in any of these states independently of the other dyads. Although the p\ model is a special case of the broader 
and well understood class of log-linear models, e.g., see Fienberg and Wasserman (1981a); Fienberg et al. 
(1985), it possesses unique features that pose challenges of both theoretical and practical nature. We de- 
scribed some of these in Petrovic et al. (2010), where we revisited the Holland-Leinhardt p\ model using 
the tools and language of algebraic statistics (see, e.g., Diaconis and Sturmfels, 1998; Pistone et al., 2001; 
Drton et al., 2009). The results from that paper are summarized in Section 2. Here we focus on the un- 
derlying geometry of pi models, rather then their algebraic properties. Our main result is Theorem 3.4, 
which gives explicit conditions for the existence of the maximum likelihood estimator and offers algorithmic 
improvement in the study of the existence of MLE. In Section 3.3 we carry out some computations for small 
networks to exemplify our results and illustrate further the subtleties of the estimation problem. 



2 The pi model: notation, description, and structure 

Versions of the pi model prescribe probability distributions over the set of directed graphs on n nodes with 
random directed and bi-directed edges. Each node corresponds to a unit in a given population of interest, 
and the edges are random variables that represent relations or interactions between units. The resulting 
random graph provides a static snapshot of the interactions among a set of agents in the populations, and it 
is often referred to as a network. 

In pi models, the random edge between any pair of nodes i and j, or dyad, is modeled independently from 
all the others. For any dyad defined by the pair of nodes there are four possible edge configurations: 

1. node i has an outgoing edge into node j: i — > j; 

2. node i as an incoming edge originating from node j: i<— j; 

3. there is a bi-directed edge between node i and node j: i < — > j; 

4. there is no edge between nodes i and j. 

Following the notation we established in Petrovic et al. (2010), which is slightly different than the original 
notation of Holland and Leinhardt (1981), for every pair of nodes (i, j) we define the vector 

Pi ,j = (p hJ (0,0),p^(l,0), PhJ (0,l), PlJ (l,l)) G A 3 (1) 

containing the probabilities of the four possible edge types, where A 3 is the standard simplex in R 4 . The 
numbers ^ ^ (1, 0), Pi,j(0, 1) andp,j(l, 1) denote the probabilities of the edge configurations i — > j, i <— j and 
i < — > j, respectively, and Pij(0, 0) is the probability that there is no edge between i and j (thus, 1 denotes 
the outgoing side of the edge). Notice that, by symmetry Pi _j (a, b) = Pjj(b, a), for any a, b G {0, 1} and that 

Pi tj (0,0)+p itj (l,0)+p itj (0,l)+p itj (l,l) = 1. (2) 

The fundamental assumption underlying p\ models is that the dyads are independent. This is formalized 
by modeling each of the (™) dyads as mutually independent draws from multinomial distributions with 
class probabilities Pi _j for i < j. Specifically, the Holland-Leinhardt pi model specifies the multionomial 
probabilities of each dyad in logarithmic form as follows: 

log (0,0) = A,., 

logp^l, 0) = Xij + an + Pj + , . 

log Pi.j (0,1) = Ajj + a 3 - + A + 

logp ij3 -(l, 1) = \j+oti+Pj+a j +f} i +2Q + pi >j . 

The real numbers 0, an, Pi, p i} j and Ajj for all i < j are the model parameters. The parameter a, quantifies 
the effect of an outgoing edge from node i, the parameter (3j instead measures the effect of an incoming edge 
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into node j, while p^j controls the added effect of reciprocated edges (in both directions). The parameter 
9 measures the propensity of the network to have edges and, therefore, controls the "density" of the graph. 
The parameter Ajj functions as a normalizing constant to ensure that (2) holds for each each dyad 
Note that, in order for the model to be identifiable, additional linear constraints need to be imposed on its 
parameters. We refer the interested readers to the original paper on p\ model by Holland and Leinhardt 
(1981) for an extensive interpretation of the model parameters. 

As noted in Fienberg and Wasserman (1981b), different variants of the p\ model can be obtained by 
constraining the model parameters. In Petrovic et al. (2010) we consider three special cases of the basic p\ 
model, which differ in the way the reciprocity parameter is modeled: 

1. pi j = 0, no reciprocal effect; 

2. pi t j = p, constant reciprocation; 

3. pi.j = p + pi + pj, edge-dependent reciprocation. 

As it is often the case with network data, we assume that data become available in the form of one 
observed network. Thus, each dyad is observed in only one of its four possible states and this one 
observation is a random vector in R 4 with a Multinomial(l,pi j:) ) distribution. As a result, data are sparse 
and, even though the dyadic probabilities are strictly positive according to the defining equations (3), only 
some of the model parameters may be estimated from the data. 

For a network on n nodes, we represent the vector of 2n(n — 1) dyadic probabilities as 



V = (Pl,2,Pl,3, ■ • ■ ,Pn-l,n) £ 



p2n(n — 1) 



where, for each i < j, pij is given as in (1). The p\ model is the set of all probability distributions that 
satisfy the Holland-Leinhardt equations (3). Give the settings just described, the pi model is log-linear; 
that is, the set of candidate probabilities p is such that \ogp is in the linear space spanned by the rows of 
some design matrix A, which can be constructed as follows (this construction is by no means unique) . The 
2n(n — 1) columns of A are indexed by the entries of the vectors p^j, i < j, where the pij's are ordered 
lexicographically, and its rows by the model parameters, ordered arbitrarily. The (r, c) entry of A is equal to 
the coefficient of the c-th parameter in the logarithmic expansion of the r-the probability as indicated in (3). 
In particular, notice that the entries of A can only be 0, 1 or 2. For example, in the case p^ = p + pt + Pj, the 
matrix A has (™) + 3n + 3 rows. When n = 3, the design matrix corresponding to this model is 
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Let £ = {\i,j,cii,fij,6,p,pi,pj,i < j} be the vector of model parameters, whose dimension d depends 
on the type of restrictions on the pi. Then, using the design matrix A, one can readily verify that the 
Holland-Leinhardt equations have the log-linear representation 



logp = A (. 
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Equivalently, letting and pk be the k-th column of A and the k-th entry of p, respectively, 

1=1 

which, upon applying the exponential transformation point-wise, is equivalent to 

Pi,j(a,b) = e A '^(e Q ') Q (e Q O b (e ft ) b (e' 3 Q (e e ) a+b (e Pl - J ) min(a ' b) , ^ < j, Va,6e{0,l}. (4) 

We denote by Ma the pi model defined by the design matrix A. It is the relatively open set in the positive 
orthant of K 2 ™(" _1 ) of dimension rank(A) consisting of (™) probability distrbutions satisfying (4). 

Let X n c ]R 2 "( n-1 ' denote the sample space, i.e. the set of all observable networks on n nodes. Then, 
every point x in the sample space X can be written as 

X = (Xl,2,£l,3, ■ • ■ ,X n -X,n), 

where each of the (' 2 l ) subvectors Xij is a vertex of A 3 . Notice that \X n \ = 4"(™ _1 ). 

Remark. This way of representing a network on n nodes as a highly-constrained 0/1 vector of dimension 
2n(n — 1) may appear redundant. Indeed, as in Holland and Leinhardt (1981), we could more naturally 
represent an n-node network using the n x n incidence matrix with 0/1 off-diagonal entries, where the (i, j) 
entry is 1 is there is an edge from i to j and otherwise. While this representation is more intuitive and 
parsimonious (as it only requires n(n — 1) bits), whenever p =/= 0, the sufficient statistics for the reciprocity 
parameter are not linear functions of the observed network. As a consequence, the ajacency matrix repre- 
sentation does not lead directly to a log-linear model. 

For any x e X n , each subvector Xij is the realization of a random vector in R 4 having Multinomial 
distribution with size 1 and class probabilities p\ rj e A 3 , where 

P° = (pi2,Pl,3,---,P°n-l,n) 

is an unknown vector in Ma (thus, p° > 0). Furthermore, (3) implies the Multinomial vectors representing 
the dyad configurations are mutually independent. Then, for any point x e X n , the likelihood function is the 
function I: Ma —> [0, 1] given by 

4(p)=n(n^( fc r jW ) C5) 

i<j \k=i j 

and the maximum likelihood estimate, or MLE, of p° is 

P = argmax pe Ma £ x (p). (6) 

Despite £ x being smooth and concave on its domain for each x G X n , there exist points x e X n for which 
the (unique) supremum of l x is achieved on the boundary of Ma, and, therefore, it will have some zero 
coordinates. In this case, the MLE of p is said not to exist. Indeed, if a vector p with zero entries is to 
satisfy the Holland and Leinhardt likelihood equations, then some of the model parameters must be equal 
to — oo. Furthermore, nonexistence of the MLE implies that only certain linear combinations of the natural 
parameters, or, equivalently, that only certain entries of p, are estimable (see, e.g., Rinaldo et al., 2009). 
While it is well known that nonexistence of the MLE is a potential issue, very little progress has been made 
since the work of Haberman (1977) on the problems of deciding whether the MLE is non-existent and which 
parameters are estimable if the MLE does not exist. 
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2.1 Markov bases 



We briefly recall the main results from Petrovic et al. (2010), describing the Markov bases for the different 
versions of the p\ model. 

By a fundamental theorem of Markov bases Diaconis and Sturmfels (1998), these correspond to gener- 
ating sets of toric ideals encoded by the design matrices. Our study of Markov moves was motivated by 
noticing special components of the design matrices. As the components that were relevant for that study 
make an appearance in our analysis of MLE, let us introduce them here. 

First, we distinguish the design matrices ("A" above) for two of the the three cases of the p\ model: 
denote the n-node matrices by Z n and £ n for zero and edge-dependent reciprocation, respectively. For each 
case, we consider the matrix of the simplified model obtained by simply forgetting the normalizing constants 
Xij. Denote these simplified matrices by Z„ and £ n . Note that ignoring Ai./s results in zero columns for each 
column indexed by pij (0, 0), and so we are effectively also ignoring all pi_j (0, 0)'s. Hence, the matrices Z n 
and £ n have (™) less rows and (™) less columns than Z n and £ n , respectively. The second special matrix will 
be denoted by A„ and is common to all three cases. It is obtained from Z n or £ n by ignoring the columns 
indexed bypij(l, 1) for all i and j, and then removing any zero rows. 

It turns out that the Markov basis of the toric ideal of the "common submatrix" A n essentially determines 
the Markov bases for the simplified models. More precisely, (Petrovic et al., 2010, Theorem 3.2.) states that 
the ideal of the simplified model defined by Z n can be decomposed into the ideal defined by A n plus another 
ideal T. Here, T consists of moves that replace a bidirected edge between any pair of nodes (i, j) by the two 
edges between them (from i to j and vice versa). Further, (Petrovic et al., 2010, Theorem 3.4.) describes 
the ideal of the simplified model defined by £ n : it can be decomposed into a sum of the ideal defined by A n 
and Q, where Q consists of moves that replace two bidirected edges between two pairs of nodes (i, j) and 
(k, I) by two other bidirected edges between (i, k) and (j, I). 

Our main result used a classical algebraic construction associating to any graph G a toric ideal Iq, as 
follows. Let </> be the map between polynomial rings defined by (j)(xij) = vtvj for each edge {i, j} of G. Its 
kernel Iq is the defining ideal of the edge subring of G. We show that these ideals provide the basic building 
blocks for the essential Markov moves of the p\ model. 

Theorem 2.1 (Petrovic et al. (2010), Theorem 3.6.). The essential Markov moves for the pi model on n nodes 
can be obtained from the Graver basis of la together with the Markov basis for the ideal Ik„ ■ Here, K n is 
the complete graph on n vertices, and G is the subgraph of the bipartite graph K n>n with edge set {vi,Wj} for 
1 < i ^= j < n. 

3 Maximum Likelihood Estimation in pi Models 

In order to derive our results about existence of the MLE forpi models, we first need to describe the geometric 
properties of these models. These properties rely in a fundamental way on the re-parametrization of p\ 
models as log-linear models. Though, as pointed out above, this parametrization is less parsimonious than 
the original one by Holland and Leinhardt (1981), it is more suitable to our geometric analysis. 

3.1 Geometric properties of pi models 

As we explained in Section 2, the statistical model specified by a pi model with design matrix A is the 
set Ma of all vectors satisfying the Holland and Leinhardt equations (3). Let Va C IR 2 ™'"^ 1 ) be the set of 
non-negative points points satisfying (4) . In fact, Va arises as the solution set of a system of polynomial 
equations, and, in the language of algebraic geometry, it is the the toric variety corresponding to the toric 
ideal I a (see Cox et al., 2005; Sturmfels, 1996; Petrovic et al., 2010). It is immediate to see that the closure 
of Ma is precisely the set 

Sa := V A n D n , 
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where D„ = {(pi,2,Pi,3 ■ ■ • ,Pn-i,n) ■ Pij € A 3 ,Vi ^ j}, with A 3 the standard 3-simplex. Any point in Sa 
is comprised of (™) probability distributions over the four possible dyad configurations (0,0), (1,0), (0,1) 
and (1, 1), one for each pair of nodes in the network. However, in addition to all strictly positive dyadic 
probability distributions satisfying (3), Sa contains also probability distributions with zero coordinates that 
are obtained as pointwise limits of points in Ma (see, for example, Geiger et al., 2006). 
The marginal polytope of the pi model with design matrix A is defined as 

P A :={t:t = Ap,peD n }. 

As we will see, Pa plays a crucial role for deciding whether the MLE exists given an observed network x. For 
our purposes, it is convenient to represent Pa as a Minkowski sum of simpler polytopes (for background, see 
Ziegler (1996)). To see this, decompose the matrix A as 

A= [A li2 A h3 ...A n _ lin \, 

where each Aij is the submatrix of A corresponding to the four probabilities 

{p j3 -(0 I 0),p«(l J 0),p j3 -(0 l l),py(l,l)} 

for the (i, j)-dyad. Then, denoting by tne Minkowski sum of polytopes, it follows that 

Pa = y^conv(Ajj). 

i<j 

We will concern ourselves with the boundary of Pa, which we will handle using the following theorem, due 
to Gritzmann and Sturmfels (1993). For a polytope Pcl l! and a vector c e M d , we write 

S(P;c) := {x: x T c > y J c,Vy e P} 

for the set of maximizers x of the inner product c 1 x over P. 

Theorem 3.1. (Gritzmann and Sturmfels, 1993). Let P 1 ,P 2 , . . . ,P k be polytopes in R d and let P = Pi+ P 2 + 
... + Pk- A nonempty subset F of P is a face of P if and only if F = F\ + F 2 + . . . + Fk for some face Fi of Pi 
such that there exists c G M. d with Fi = S(Pi; c) for all i. Furthermore, the decomposition F = F\ + F 2 + . . . + Fk 
of any nonempty face F is unique. 

The marginal polytope Pa and the set Sa are closely related. Indeed, as shown in Morton (2008), Sa 
and Pa are homeomorphic. 

Theorem 3.2. (Morton, 2008). The map p,: Sa — > Pa given by p4 Apis a homemorphism. 

This result is a generalization of the moment map theorem for toric varieties defined by homogeneous 
toric ideals (see Fulton, 1993) and Ewald (1996) to the multi-homogeneous case. In particular, it implies 
that P A = {t: t = Ap,p e Sa}- 

3.2 Existence of the MLE in p 1 models 

We focus on two important problems related to maximum likelihood estimation in pi models that have not 
been thoroughly investigated in both theory and practice. Our results should be compared, in particular, 
with the ones given in Haberman (1977). The geometric properties of Sa and Pa established above are 
fundamental to our analysis. 

We begin with a result justifying the name for the marginal polytope. 

Lemma 3.3. The polytope Pa is the convex hull of the set of all possible observable marginals; in symbols: 

Pa = conv({£: t = Ax,x E X n }). 
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Proof. If t = Ax for some x £ X n , then, by definition, t is in the Minkowski sum of conv(Aij)'s for all i < j, 
yielding that conv({£ : t = Ax, x £ X n }) is a subset of Pa- Conversely, by theorem 3.1, any extreme point v 
of Pa can be written as v = + ^1,3 + . . . + u n -i,nj where each is an extreme point of conv(Aij) such 
that Vij = 5(conv(Ajj); c) for some vector c and all i < j. Since, for every i ^ j, the columns of Ajj are 
affinely independent, they are also the extreme points of conv(Aij). Therefore, the extreme points of Pa are 
a subset of {t : t = Ax, x £ A'n}, which implies that Pa is a subset of conv({i : t = Ax, x e X n }). ■ 

Let x denote the observed network. From standard statistical theory of exponential families (see, e.g., 
Barndorff-Nielsen, 1978; Brown, 1986; Lauritzen, 1996), the MLE p exists if and only if the vector of margins 
t = Ax belongs to the relative interior of Pa, denoted by ri(PA). Furthermore, it is always unique and 
satisfies the moment equations, which we formulate using the moment map of Theorem 3.2 as 

P = H~\t). (7) 

When the MLE does not exists, the likelihood function still achieves its supremum at a unique point, 
which also satisfies the moment equations (7), and is called the extended MLE. The extended MLE is a point 
on the boundary of Sa and, therefore, contains zero coordinates. Although not easily interpretable in terms 
of the model parameters of the Holland and Leinhardt likelihood equations (3), the extended MLE still 
provides a perfectly valid probabilistic representation of a network, the only difference being that certain 
network configurations have a zero probability of occurring. 

We are concerned with the following two problems that are essential for computing the MLE and extended 
MLE: 

1. decide whether the MLE exists, i.e. whether observed vector of margins t belongs to vi{Pa), the relative 
interior of the marginal polytope; 

2. compute supp(p), the support of p, where supp(a-) = {i: Xi ^= 0}. Clearly, this second task is nontrivial 
only when t is a point on the relative boundary of Pa, denoted by rb(PA), and we are interested in the 
extended MLE. 

Both problems require handling the geometric and combinatorial properties of the marginal polytope Pa 
and of its faces, and to relate any point in rb(PA) to the boundary of Sa- In general, this is challenging, as 
the combinatorial complexity of Minkowki sums could be very high. Fortunately, we can simplify these tasks 
by resorting to a larger polyhedron with which it is easier to work. 

Let Ca = conc(A) be the marginal cone (see Eriksson et al., 2006), which is easily seen to be the convex 
hull of all the possible observable sufficient statistics if there were no constraints on the number of possible 
observations per dyad. Notice that, since the columns of A are affinely independent, they define the extreme 
rays of Ca- Following Geiger et al. (2006), a set T C {1,2,..., 2n(n — 1)} is called a facial set of Ca if there 
exists a c such that 

c T a, = 0, Vi G T and c T a l < 0, Mi T , 

where a; indicates the i-th column of A. Then, F is face of Ca if and only if F = cone({ai : i e J 7 }), for 
some facial set T of Ca, and that there is a one-to-one correspondence between the facial sets and the faces 
olC A - 

In our main result of this section, we show that the existence of the MLE can be determined using the 
simpler set Ca and that the supports of the points on the boundary of Sa are facial sets of Ca- 

Theorem 3.4. Let t e Pa- Then, t e ri(P,t) if and only ift e ri(CA). Furthermore, for every face G of Pa there 
exists one facial set T of Ca such that T = supp(p), where p = for each t e ri(G). 

Proof. By Lemma 3.3, Pa C Ca- Thus, since both Pa and Ca are closed, t e ti(Pa) implies t £ ri(CU). For 
the converse statement, suppose t belongs to a proper face G of Pa- Then, by Theorem 3.1, 

t = tx t 2 + £l,3 + • ■ • + tn-l.n (8) 

where ti.j belongs to a face of conv(A ij ), for each i < j. We now point out two crucial features of Pa that 
can be readily checked: 
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(i) the first (™) coordinates of any point in Pa are all equal to 1; 

(ii) the first (™) coordinates of each ti t j are all zeros except one, which takes on the value 1. 

Suppose now that t G ti(Ca)- Because of (i) and (ii), there exists a point in x G D n with strictly positive 
entries such that t = Ax. In turn, this implies that 

t = t 12 + t 1 3 + . . . + t n _ l nl 

where t'^ 6 ri(conv(Ajj)) for each i < j, which contradicts (8). Thus, t G rb(C'A). To prove the second 
claim, notice that, because of the uniqueness of the decomposition in Theorem 3.1, our arguments further 
yield that, for every proper face G of Pa, there exists one face F of Ca such that ri(G) C ri(F). Consequently, 
to every face G of Pa corresponds one facial set T of Ca such that, if t G ri(G), then 

t = A P (9) 

for some p G D n with supp(p) = T . Since columns of A are affinely independent, this p is unique. By 
Theorem 3.2, there exists a unique point p' = n~ x {t) G Sa satisfying (9), so that supp(p') C T. Inspection 
of the proof of the same theorem yields that, in fact, supp(p') = T, so that p' = p. See also the appendix in 
Geiger et al. (2006). ■ 

From the algorithmic standpoint, the previous theorem allows us to work directly with Ca, whose face 
lattice is much simpler. For example, while we know the extreme rays of Ca, it is highly nontrivial to find 
the vertices of Pa among 4"(™ -1 ) vectors of observable sufficient statistics. Algorithms for deciding whether 
t G ri(CU) and for finding the facial set corresponding to the face F of Ca such that t G n(F) are presented 
in Eriksson et al. (2006) and Rinaldo (2006). 

We conclude this section with a statistical remark. Using the terminology of log-linear modeling theory 
(see, e.g., Bishop et al., 2007)) p\ models are log-linear models arising from a product-Multinomial sampling 
scheme. This additional constraint is what makes dealing with Pa (and, as we showed in Petrovic et al., 2010, 
computing the Markov bases) particularly complicated in comparison with the same tasks under Poisson or 
Multinomial sampling scheme. In these simpler sampling settings, the toric ideal I a is homogenous, the 
model Sa is homoemorphic to Ca for the Poisson scheme, and to conv(^4) for the Multinomial scheme 
and all Markov moves are applicable (provided that their degree is smaller than J2 i Xi for the Multinomial 
scheme) . 

3.3 Computations 

We now briefly summarize some of the computations involving Pa and Ca we carried out based on polymake 
(see Gawrilow and Joswig, 2000)) and minksum (see Weibel, 2005). Even though, due to the high compu- 
tational complexity, this can only be a rather limited account, we believe it provides some indications of the 
complexity of pi models and of the subtleties of the maximum likelihood estimation problem. 

We first indicate how non-existence of the MLE and the determination of the appropriate facial set can 
be addressed using simple linear programming. While checking for the existence of the MLE is relatively 
inexpensive, the second task is more demanding. We refer the interested readers to Rinaldo (2006) for 
further details and more efficient algorithms. In order to decide whether the MLE exists it is sufficient to 
establish whether the observed sufficient statistics t belong to the relative interior of Pa, which, by Theorem 
3.4, happens if and only if it belongs to the relative interior of Ca- In turn, we can decide this by solving the 
following simple linear program: 

maxs 
s.t. Ax = t 

Xi — s > 
s > 0, 

where the scalar s and vector x are the variables. At the optimum (s*,x*), the MLE exists if and only if 
s* > 0. Though very simple, this algorithm may not be sufficient to compute the support of p if the MLE does 
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Table 1 : Number of vertices for the polytopes Pa for different specifications of the pi model and different 
network sizes. Computations carried out using minksum Weibel (2005). The last column indicates the 
number of columns of the design matrix A, which correspond to the number of generators of Ca- 
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Table 2: Number of facets, dimensions and ambient dimensions of the the cones Ca for different specifica- 
tions of the pi model and different network sizes. 

not exist. To this end, we need to resort to a more sophisticated algorithm. Let en denote the i-th column of 
A, and consider the following 2n(n — 1) programs, one for each column of A: 

max(a.j,y} 
s.t. y T t = 
A T y > 

-l < y < i 

Let y* denote the solution to the i-th program (notice that y* is a vector). Then, the MLE does not exist if 
and only if {en, y*i) > for some i, in which case the facial set associated with t is given by 

{i: (fi,y*) = 0}. 

We now provide some numerical evidence on how Theorem 3.4 significantly simplifies both tasks. Table 
3.3 displays the number of vertices of the polytopes Pa for the three model specifications described in 
Section 2 and various networks sizes. The last column of the table contains the number of columns of the 
design matrix, which is also the number of extreme rays of the marginal cone Ca ■ In comparison, the number 
of vertices of Pa is very hard to compute; it is very large and grows extremely fast with n. 

In Table 3.3 we report the number of facets, dimensions and ambient dimensions of the cones Ca for 
different values of n and for the three specification of the reciprocity parameters pi_j we consider here. 
Though this only provides an indirect measure of the complexity of these models and of the non-zero patterns 
in extended MLEs, it does show how quickly the complexity of p\ models may scale with the network size n. 

Another point of interest is the assessment of how often the MLE exists. In fact, because of the product 
Multinomial sampling constraint, nonexistence of the MLE is quite severe, especially for smaller networks. 
Below we report our finding: 

1. n = 3. 

The sample space consists of 4 3 = 64 possible networks. When pij = for all i and j, there are 63 
different observable sufficient statistics, only one of which belongs to ti(Pa)- Thus, only one of the 
63 observable sufficient statistics leads to the existence of the MLE. The corresponding fiber contains 
only two networks: 001001000010 and 01000010010 0, which encode the 
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incidence matrices 



x 1 
1x0 
1 x 



and 



1 

x 1 
x 



respectively. In both cases, the associated MLE is the 12-dimensional vector with entries all equal to 
0.25. Incidentally, the marginal polytope Pa has 62 vertices and 30 facets. When pij = p / or 
pi,j = Pi + Pj the MLE never exists. 

2. n = 4. 

The sample space contains 4096 observable networks. If pi^ = 0, there are 2,656 different observable 
sufficient statistics, only 64 of which yield existent MLEs. Overall, out of the 4096 possible networks, 
only 426 have MLEs. When pij = p ^ 0, there are 3, 150 different observable sufficient statistics, only 
48 of which yield existent MLEs. Overall, out of the 4096 possible networks, only 96 have MLEs. When 
Pi,j = Pi + Pj, there are 3150 different observable sufficient statistics and the MLE never exists. 

3. n = 5. 

The sample space consists of 4 10 = 1,048,576 different networks. If pij = 0, there are 225,025 
different sufficient statistics, and the MLE exists for 7, 983. If pij = p ^ the number of distinct 
possible sufficient statistics is 349, 500, and the MLE exists in 12, 684 cases. Finally, when pij = pi + pj, 
the number of different sufficient statistics is 583, 346 and the MLE never exists. 



3.3.1 The case p = 

When p = 0, nonexistence of the MLE can be more easily detected. Even though this is the simplest of pi 
models we consider, the observations below apply to more complex p\ models as well, since nonexistence of 
the MLE for the case p = implies nonexistence of the MLE in all the other cases. Furthermore, by setting 
p = 0, we recover the random Rasch matrix model for exchangeable binary arrays discussed in Lauritzen 
(2008). Furthermore, the undirected version of this pi model corresponds to the random graph models 
with independent dyads and the degree sequence as the vector of sufficient statistics, recently studied by 
Chatterjee et al. (2010). 

Only for this case, it is more convenient to switch to the parametrization originally used in Holland and Leinhardt 
(1981), and describe networks using incidence matrices. Thus, each network on n nodes can be represented 
as a n x n matrix with 0/1 off-diagonal entries, where the entry is 1 if there is an edge from i to j, 
and otherwise. In this parametrization, the sufficient statistics for the parameters {ai,i = 1, . . . ,n} and 
= 1, . . . , n} are the row and column sums, respectively. Just like with the other pi models, the minimal 
sufficient statistic for the density parameter is the total number of edges, which is a linear function of the 
sufficient statistics for the other parameters; because of this, it can be ignored. 

When p = 0, there are three cases where the MLE does not exist. The first two are immediate, while the 
third one is quite subtle. In order to describe them we recall that, from the standard theory of exponential 
families (Lauritzen, 1996, see, e.g.,)), the MLE p viewed as a n x n incidence matrix satisfies the moment 
equations, namely the row and column sums of p match the corresponding row and column sums of the 
observed network x. Thus, the MLE does not exist whenever this constraint cannot be satisfied by any 
strictly positive vector. As a result, for an observed network x, the MLE is non-existent if x contains zeros 
in certain positions such that there does not exist any vector z with strictly positive coordinates satisfying 
Ax = Az. To this end, we consider the 2n x n(n — 1) sub-matrix B of the design matrix A obtained by 
including only the columns corresponding to the probabilities Pij(l, 0) and Pij(0, 1), for all i < j and only 
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the rows corresponding to the parameters {on, i = 1, . . . , n} and j = 1 . . . , n}. For example, when n = 3, 



10 10 
10 10 
1 1 

10 10 

1 1 
10 10 

where its columns correspond to the probabilities £1^(1,0), ^1,2(0,1), pi,3(l,0), ^1,3(0, 1), P2, 3(1,0) and 
^2,3(0, 1). Since the columns of B are affinely independent, in order to determine the patterns of zeros 
leading to a nonexistent MLE, it is sufficient to look at the facial sets of the pointed polyhedral cone cone(B). 

Using the strategy outlined above, we can distinguish three separate cases for which the MLE does not 
exist. The first two cases are obvious, since they correspond, respectively, to the maximal and minimal 
number of l's that can be observed given the dyadic multinomial constraints. 

1. A row or column sum is equal to n — 1. If row i sums to n — 1, then pij(l, 0) = 1 for all j, which implies 
that pij (0,1) = pij (0,0) = 0, for all j . Similarly, if column i sums to n — 1, (0,1) = 1 for all j, which 
implies that (1, 0) = pij (0, 0) = 0, for all j. 

2. A row or column sum is 0. If row i has a zero sum, then pY/(l, 0) = p%j(l, 1) = 0, for all j. Similarly, if 
column i has a zero sum, then (0, 1) = p%j(l, 1) = 0, for all j. 

3. The last case is much less obvious and does not appear to be captured by a general rule, so we provide 
two instances of it for the cases n — 3 and n = 4. When n = 3, it is easy to see that a MLE with positive 
coordinate cannot exist if any of the following patterns of zeros are observed in any network x: 











Notice that the occurrence of a zero in the positions indicated above does not necessarily imply any of 
the previous two cases. When n = 4, there are 4 patterns of zeros leading to a nonexistent MLE, even 
though the margins can be positive and smaller than 3: 



X 











X 









X 








X 



x 
x 







x 



X 










X 







X 











X 



x 
0x0 
x 



As it turns out, we can show that that these are the same patterns of zeros that are relevant to the 
existence of the MLE for the model of quasi-independence for two-way contingency tables and the 
Rasch model in which the number of subjects equals the number of items. 

Based on our computations carried out in polymake for networks of size up to n = 10, we have the 
following conjecture: 

Conjecture 3.5. For a network on n nodes, coiic(B) has 3n facets, 2n of which correspond to patterns of zeros 
leading to a zero row or column margin, and the remaining n to patterns of zeros which cause the MLE not to 
exist without inducing zero margins. 

Finally, we point out that that the matrix B described above plays a crucial role in describing the Markov 
bases for p\ model as we described in our previous work Petrovic et al. (2010), where we refer to B as the 
common submatrix ofall the p\ models. Indeed, in Theorem 2.1 and two other related results summarized in 
Section 2.1, we showed that the Markov basis of the toric ideal generated by B essentially determines the 
Markov bases for the simplified models. 
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